tissue type
SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space
Redekop, Ekaterina, Pleasure, Mara, Wang, Zichen, Flores, Kimberly, Sisk, Anthony, Speier, William, Arnold, Corey W.
The rapid growth of digital pathology and advances in self-supervised deep learning have enabled the development of foundational models for various pathology tasks across diverse diseases. While multimodal approaches integrating diverse data sources have emerged, a critical gap remains in the comprehensive integration of whole-slide images (WSIs) with spatial tran-scriptomics (ST), which is crucial for capturing critical molecular heterogeneity beyond standard hematoxylin & eosin (H&E) staining. We introduce SPADE, a foundation model that integrates histopathology with ST data to guide image representation learning within a unified framework, in effect creating an ST-informed latent space. These authors contributed equally to this work. Pre-trained on the comprehensive HEST-1k dataset, SPADE is evaluated on 20 downstream tasks, demonstrating significantly superior few-shot performance compared to baseline models, highlighting the benefits of integrating morphological and molecular information into one latent space. Introduction High-resolution whole slide images (WSIs) have propelled the development of powerful deep learning foundation models in computational pathology, demonstrating robust performance across diverse tissue types and tasks [1, 2, 3, 4]. These models are typically trained using self-supervision, enabling learning from large unlabeled datasets and producing embeddings robust to institutional variations, including differences in staining procedures and other image-quality factors [5, 6, 7, 8]. By visually capturing cellular arrangement, WSIs enable the study of spatial organization and disorganization of cells in tissues, characterizations that are especially relevant in cancer research [9, 10]. In clinical settings, WSIs are commonly stained with hematoxylin & eosin (H&E), a two-color stain that highlights nuclei and cytoplasm but offers a limited view of molecular-level heterogeneity [11]. As tumor tissues are known to exhibit high variability within and across patients, deciphering the heterogeneity at the molecular level is critical for improving deep learning applications that can more precisely inform diagnosis, treatment, and prognosis [12, 13]. While H&E provides crucial morphological insights, its inability to capture molecular heterogeneity limits its utility in fully characterizing tissue complexity. Spatial transcriptomics addresses this gap by providing spatially resolved gene expression data, allowing for additional molecular context for a given tissue specimen. Although both ST and H&E data have independently proven useful in various applications, their combined potential for creating a more comprehensive representation learning framework remains unexplored. To this end, we introduce SPADE, a vision-ST foundation model that uses a mixture of experts, each trained via contrastive learning, to unify ST data and H&E images to produce slide representations that encompass both modalities.
Standardized Multi-Layer Tissue Maps for Enhanced Artificial Intelligence Integration and Search in Large-Scale Whole Slide Image Archives
Fiala, Gernot, Plass, Markus, Harb, Robert, Regitnig, Peter, Skok, Kristijan, Zoughbi, Wael Al, Zerner, Carmen, Torke, Paul, Kargl, Michaela, Müller, Heimo, Brazdil, Tomas, Gallo, Matej, Kubín, Jaroslav, Stoklasa, Roman, Nenutil, Rudolf, Zerbe, Norman, Holzinger, Andreas, Holub, Petr
A Whole Slide Image (WSI) is a high-resolution digital image created by scanning an entire glass slide containing a biological specimen, such as tissue sections or cell samples, at multiple magnifications. These images can be viewed, analyzed, shared digitally, and are used today for Artificial Intelligence (AI) algorithm development. WSIs are used in a variety of fields, including pathology for diagnosing diseases and oncology for cancer research. They are also utilized in neurology, veterinary medicine, hematology, microbiology, dermatology, pharmacology, toxicology, immunology, and forensic science. When assembling cohorts for the training or validation of an AI algorithm, it is essential to know what is present on such a WSI. However, there is currently no standard for this metadata, so such selection has mainly been done through manual inspection, which is not suitable for large collections with several million objects. We propose a general framework to generate a 2D index map for WSI and a profiling mechanism for specific application domains. We demonstrate this approach in the field of clinical pathology, using common syntax and semantics to achieve interoperability between different catalogs. Our approach augments each WSI collection with a detailed tissue map that provides fine-grained information about the WSI content. The tissue map is organized into three layers: source, tissue type, and pathological alterations, with each layer assigning segments of the WSI to specific classes. We illustrate the advantages and applicability of the proposed standard through specific examples in WSI catalogs, Machine Learning (ML), and graph-based WSI representations.
scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction
Wang, Qing, Pan, Yining, Zhou, Minghao, Tang, Zijia, Wang, Yanfei, Wang, Guangyu, Song, Qianqian
Drug resistance presents a major challenge in cancer therapy. Single cell profiling offers insights into cellular heterogeneity, yet the application of large-scale foundation models for predicting drug response in single cell data remains underexplored. To address this, we developed scDrugMap, an integrated framework featuring both a Python command-line interface and a web server for drug response prediction. scDrugMap evaluates a wide range of foundation models, including eight single-cell models and two large language models, using a curated dataset of over 326,000 cells in the primary collection and 18,800 cells in the validation set, spanning 36 datasets and diverse tissue and cancer types. We benchmarked model performance under pooled-data and cross-data evaluation settings, employing both layer freezing and Low-Rank Adaptation (LoRA) fine-tuning strategies. In the pooled-data scenario, scFoundation achieved the best performance, with mean F1 scores of 0.971 (layer freezing) and 0.947 (fine-tuning), outperforming the lowest-performing model by over 50%. In the cross-data setting, UCE excelled post fine-tuning (mean F1: 0.774), while scGPT led in zero-shot learning (mean F1: 0.858). Overall, scDrugMap provides the first large-scale benchmark of foundation models for drug response prediction in single-cell data and serves as a user-friendly, flexible platform for advancing drug discovery and translational research.
Universal Lesion Segmentation Challenge 2023: A Comparative Research of Different Algorithms
Shi, Kaiwen, Li, Yifei, Ho, Binh, Wang, Jovian, Guo, Kobe
Medical image segmentation is a crucial task in medical image processing. Thanks to the advent of CNN[12], U-Net [17], and their variants such as V-Net[14], 3D U-Net[5], Res-UNet[15], Dense-UNet[13], we are able to perform segmentation task with precision. More recently, with implementations of transformer-based models, the medical imaging community enjoyed satisfying success in segmentation tasks. Networks like Medical Transformers[18] and SwinUnet[1] push the front-line boundary to another degree. Others have implemented learning methodologies from other fields, such as dictionary learning, to work on medical images. KEN[16] - knowledge embedding network - for example, takes advantage of the fruitfulness of information embedding in each layer via dictionary learning to provide a more semantically meaningful network.
Mind the Gap: Evaluating Patch Embeddings from General-Purpose and Histopathology Foundation Models for Cell Segmentation and Classification
Vadori, Valentina, Peruffo, Antonella, Graïc, Jean-Marie, Finos, Livio, Grisan, Enrico
Recent advancements in foundation models have transformed computer vision, driving significant performance improvements across diverse domains, including digital histopathology. However, the advantages of domain-specific histopathology foundation models over general-purpose models for specialized tasks such as cell analysis remain underexplored. This study investigates the representation learning gap between these two categories by analyzing multi-level patch embeddings applied to cell instance segmentation and classification. We implement an encoder-decoder architecture with a consistent decoder and various encoders. These include convolutional, vision transformer (ViT), and hybrid encoders pre-trained on ImageNet-22K or LVD-142M, representing general-purpose foundation models. These are compared against ViT encoders from the recently released UNI, Virchow2, and Prov-GigaPath foundation models, trained on patches extracted from hundreds of thousands of histopathology whole-slide images. The decoder integrates patch embeddings from different encoder depths via skip connections to generate semantic and distance maps. These maps are then post-processed to create instance segmentation masks where each label corresponds to an individual cell and to perform cell-type classification. All encoders remain frozen during training to assess their pre-trained feature extraction capabilities. Using the PanNuke and CoNIC histopathology datasets, and the newly introduced Nissl-stained CytoDArk0 dataset for brain cytoarchitecture studies, we evaluate instance-level detection, segmentation accuracy, and cell-type classification. This study provides insights into the comparative strengths and limitations of general-purpose vs. histopathology foundation models, offering guidance for model selection in cell-focused histopathology and brain cytoarchitecture analysis workflows.
HisynSeg: Weakly-Supervised Histopathological Image Segmentation via Image-Mixing Synthesis and Consistency Regularization
Fang, Zijie, Wang, Yifeng, Xie, Peizhang, Wang, Zhi, Zhang, Yongbing
Tissue semantic segmentation is one of the key tasks in computational pathology. To avoid the expensive and laborious acquisition of pixel-level annotations, a wide range of studies attempt to adopt the class activation map (CAM), a weakly-supervised learning scheme, to achieve pixel-level tissue segmentation. However, CAM-based methods are prone to suffer from under-activation and over-activation issues, leading to poor segmentation performance. To address this problem, we propose a novel weakly-supervised semantic segmentation framework for histopathological images based on image-mixing synthesis and consistency regularization, dubbed HisynSeg. Specifically, synthesized histopathological images with pixel-level masks are generated for fully-supervised model training, where two synthesis strategies are proposed based on Mosaic transformation and B\'ezier mask generation. Besides, an image filtering module is developed to guarantee the authenticity of the synthesized images. In order to further avoid the model overfitting to the occasional synthesis artifacts, we additionally propose a novel self-supervised consistency regularization, which enables the real images without segmentation masks to supervise the training of the segmentation model. By integrating the proposed techniques, the HisynSeg framework successfully transforms the weakly-supervised semantic segmentation problem into a fully-supervised one, greatly improving the segmentation accuracy. Experimental results on three datasets prove that the proposed method achieves a state-of-the-art performance. Code is available at https://github.com/Vison307/HisynSeg.
Controlling sharpness, SNR and SAR for 3D FSE at 7T by end-to-end learning
Dawood, Peter, Blaimer, Martin, Herrler, Jürgen, Liebig, Patrick, Weinmüller, Simon, Malik, Shaihan, Jakob, Peter M., Zaiss, Moritz
Purpose: To non-heuristically identify dedicated variable flip angle (VFA) schemes optimized for the point-spread function (PSF) and signal-to-noise ratio (SNR) of multiple tissues in 3D FSE sequences with very long echo trains at 7T. Methods: The proposed optimization considers predefined SAR constraints and target contrast using an end-to-end learning framework. The cost function integrates components for contrast fidelity (SNR) and a penalty term to minimize image blurring (PSF) for multiple tissues. By adjusting the weights of PSF/SNR cost-function components, PSF- and SNR-optimized VFAs were derived and tested in vivo using both the open-source Pulseq standard on two volunteers as well as vendor protocols on a 7T MRI system with parallel transmit extension on three volunteers. Results: PSF-optimized VFAs resulted in significantly reduced image blurring compared to standard VFAs for T2w while maintaining contrast fidelity. Small white and gray matter structures, as well as blood vessels, are more visible with PSF-optimized VFAs. Quantitative analysis shows that the optimized VFA yields 50% less deviation from a sinc-like reference PSF than the standard VFA. The SNR-optimized VFAs yielded images with significantly improved SNR in a white and gray matter region relative to standard (81.2\pm18.4 vs. 41.2\pm11.5, respectively) as trade-off for elevated image blurring. Conclusion: This study demonstrates the potential of end-to-end learning frameworks to optimize VFA schemes in very long echo trains for 3D FSE acquisition at 7T in terms of PSF and SNR. It paves the way for fast and flexible adjustment of the trade-off between PSF and SNR for 3D FSE.
Deep-Learning Approach for Tissue Classification using Acoustic Waves during Ablation with an Er:YAG Laser (Updated)
Seppi, Carlo, Cattin, Philippe C.
Today's mechanical tools for bone cutting (osteotomy) cause mechanical trauma that prolongs the healing process. Medical device manufacturers aim to minimize this trauma, with minimally invasive surgery using laser cutting as one innovation. This method ablates tissue using laser light instead of mechanical tools, reducing post-surgery healing time. A reliable feedback system is crucial during laser surgery to prevent damage to surrounding tissues. We propose a tissue classification method analyzing acoustic waves generated during laser ablation, demonstrating its applicability in an ex-vivo experiment. The ablation process with a microsecond pulsed Er:YAG laser produces acoustic waves, acquired with an air-coupled transducer. These waves were used to classify five porcine tissue types: hard bone, soft bone, muscle, fat, and skin. For automated tissue classification, we compared five Neural Network (NN) approaches: a one-dimensional Convolutional Neural Network (CNN) with time-dependent input, a Fully-connected Neural Network (FcNN) with either the frequency spectrum or principal components of the frequency spectrum as input, and a combination of a CNN and an FcNN with time-dependent data and its frequency spectrum as input. Consecutive acoustic waves were used to improve classification accuracy. Grad-Cam identified the activation map of the frequencies, showing low frequencies as the most important for this task. Our results indicated that combining time-dependent data with its frequency spectrum achieved the highest classification accuracy (65.5%-75.5%). We also found that using the frequency spectrum alone was sufficient, with no additional benefit from applying Principal Components Analysis (PCA).
AI-based Anomaly Detection for Clinical-Grade Histopathological Diagnostics
Dippel, Jonas, Prenißl, Niklas, Hense, Julius, Liznerski, Philipp, Winterhoff, Tobias, Schallenberg, Simon, Kloft, Marius, Buchstab, Oliver, Horst, David, Alber, Maximilian, Ruff, Lukas, Müller, Klaus-Robert, Klauschen, Frederick
While previous studies have demonstrated the potential of AI to diagnose diseases in imaging data, clinical implementation is still lagging behind. This is partly because AI models require training with large numbers of examples only available for common diseases. In clinical reality, however, only few diseases are common, whereas the majority of diseases are less frequent (long-tail distribution). Current AI models overlook or misclassify these diseases. We propose a deep anomaly detection approach that only requires training data from common diseases to detect also all less frequent diseases. We collected two large real-world datasets of gastrointestinal biopsies, which are prototypical of the problem. Herein, the ten most common findings account for approximately 90% of cases, whereas the remaining 10% contained 56 disease entities, including many cancers. 17 million histological images from 5,423 cases were used for training and evaluation. Without any specific training for the diseases, our best-performing model reliably detected a broad spectrum of infrequent ("anomalous") pathologies with 95.0% (stomach) and 91.0% (colon) AUROC and generalized across scanners and hospitals. By design, the proposed anomaly detection can be expected to detect any pathological alteration in the diagnostic tail of gastrointestinal biopsies, including rare primary or metastatic cancers. This study establishes the first effective clinical application of AI-based anomaly detection in histopathology that can flag anomalous cases, facilitate case prioritization, reduce missed diagnoses and enhance the general safety of AI models, thereby driving AI adoption and automation in routine diagnostics and beyond.
Semiparametric Principal Component Analysis
We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian.